Abstract: Now a days it is important to extract knowledge from structured i.e. relational databases, XML and unstructured i.e. text documents, images etc sources. The resulting knowledge needs to be in a human-readable format. The large amount of online repositories requires extracting useful knowledge. This extracted knowledge is then useful for further processing like analysis etc. There are some domains like Biomedical, E-Commerce, Banking, where the huge amount of data present. This huge amount of data needs analysis, in order to produce useful knowledge which will helpful for analysis like in biomedical domain analysis of Clinical data used to predict the Disease, Treatments etc. There are various techniques which are used for knowledge extraction. In this proposed work, we are using Clustering technique. Clustering analysis is an important field of artificial intelligence and data mining. The basic idea is to use words and characters from the documents for checking degree of similarities among documents and cluster those documents without prior knowledge. This paper introduces the proposed work based on clustering biomedical abstracts to extract the hidden knowledge. The document clustering of biomedical abstracts will be based on Genetic algorithm. The work will be based on Genetic Algorithm to find optimal cluster centre. Genetic algorithm performs simultaneous mutation. The proposed algorithm outperforms the K-Means and Simple GA.
Keywords: Genetic Algorithm, Clustering, TF-IDF, Biomedical abstracts.